Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

نویسندگان

  • Ronald Ortner
  • Odalric-Ambrym Maillard
  • Daniil Ryabko
چکیده

We consider a reinforcement learning setting introduced in [6] where the learner does not have explicit access to the states of the underlying Markov decision process (MDP). Instead, she has access to several models that map histories of past interactions to states. Here we improve over known regret bounds in this setting, and more importantly generalize to the case where the models given to the learner do not contain a true model resulting in an MDP representation but only approximations of it. We also give improved error bounds for state aggregation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Near Optimal Behavior via Approximate State Abstraction

The combinatorial explosion that plagues planning and reinforcement learning (RL) algorithms can be moderated using state abstraction. Prohibitively large task representations can be condensed such that essential information is preserved, and consequently, solutions are tractably computable. However, exact abstractions, which treat only fully-identical situations as equivalent, fail to present ...

متن کامل

Exploration in Metric State Spaces

We present metric, a provably near-optimal algorithm for reinforcement learning in Markov decision processes in which there is a natural metric on the state space that allows the construction of accurate local models. The algorithm is a generalization of the algorithm of Kearns and Singh, and assumes a black box for approximate planning. Unlike the original , metricfinds a near optimal policy i...

متن کامل

Barycentric Interpolators for Continuous Space and Time Reinforcement Learning

In order to find the optimal control of continuous state-space and time reinforcement learning (RL) problems, we approximate the value function (VF) with a particular class of functions called the barycentric interpolators. We establish sufficient conditions under which a RL algorithm converges to the optimal VF, even when we use approximate models of the state dynamics and the reinforcement fu...

متن کامل

Flexible State-dependant Machine Scheduling Problems Using Reinforcement Learning

This paper presents a simulation-based optimization methodology called reinforcement learning (RL) and suggests a neural approach to approximate the values when the systems under study are complex and involve large-scale decision-making sequential tasks. Computer simulation based reinforcement learning (RL) methods of stochastic approximation have been proposed in recent years as viable alterna...

متن کامل

2 Definition of Barycentric Interpolators

In order to nd the optimal control of continuous state-space and time reinforcement learning (RL) problems, we approximate the value function (VF) with a particular class of functions called the barycentric interpolators. We establish su cient conditions under which a RL algorithm converges to the optimal VF, even when we use approximate models of the state dynamics and the reinforcement functi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014